_______________________________________________________ D/Noise 1.0d A Digital Audio Denoising Tool _______________________________________________________ Windows 95 version (C) 1996 Fast Mathematical Algorithms and Hardware Corporation, 1020 Sherman Avenue, Hamden, CT 06514. _______________________________________________________ INTRODUCTION This demonstration version is meant to illustrate some of our current work in the area of audio signal processing. It is in no way suited for commercial denoising. This version will only operate on monophonic 16-bit WAV (Audio Interchange File Format) files. In addition, there is a limit on the size of the input file of one million sample points. This version of D/Noise does not support algorithm iteration, i.e., the denoising algorithm makes only a single pass through an audio file, separating it into what it thinks is coherent and what is noise. In the next release, you will be able to specify how many times the algorithm will pass through a file and its components to achieve a more thorough separation. In addition, you will be able to save a compressed version of the denoised file as well as well as apply some basic pre- and post- processing transforms. _______________________________________________________ Installation ---------------- This distribution consists of 4 files which should stay together in one folder: (1) Preliminary documentation (this file) (2) Dnoise.exe, the shell used to run the algorithm (3) Denoise.dll, the denoising algorithm in a dynamic link library (4) Caruso.wav, a sample audio file containing a snippet of Enrico Caruso’s singing, recorded in 1904 Opening a WAV file for denoising ------------------------------------------------- D/Noise performs a one-pass denoising procedure on an open WAV file. To open a file: [1] Select "Open..." from the "File" menu. [2] Locate and open a file using the standard file dialog. You can run the denoising procedure on the entire file or just a short segment of it. To select a segment of your source file, click-drag across it with the mouse. The toolbar along the top of the main window has a couple standard controls for scrolling the wave form representation of the file, as well zooming in and out. The smallest length of the signal you can select is determined by the control at the right end of the toolbar at the bottom of the main window. This length also determines the size of the sliding signal window used in the denoising procedure. A length of 1,024 sample points is usually adequate. [NB: the other controls in the bottom toolbar are not functional yet and appear disabled.] Setting denoising parameters ---------------------------------------- The outcome of the denoising procedure depends on the settings of various parameters. The exact meaning of these parameters is explained at the end of this document. To open the denoising algorithm interface, select "Configure..." from the "Denoise" menu. You can select one of two default parameter sets or enter your own. To select a default set, click on the "Default 1" or "Default 2" button in the "Parameters" frame. You can also set your own parameter values. You can use the [tab] key to jump from one box to the next. You will get an error message if you try to enter a value outside the range of a specific parameter. Running the denoising procedure ----------------------------------------------- [1] Setting the output files The denoising process will leave your original input file untouched and generate two new files. The first of these two new files will contain the coherent ("clean") component of the source file and the second will contain the noisy component. In an extended procedure you could run the process on the noisy file again to extract even more coherent parts and add those to the first clean file. This version of D/Noise does not yet support this type of iteration (although you can do this "by hand"). In the next release, you will be able to specify a number of iterations for the algorithm. Use the Select... buttons to select names and locations for the two output files [Hint: if your files are fairly small and you have RAM to spare, you may want to put the output files on a RAM disk to speed up the process and minimize disk thrashing. You will need the same amount of storage for each the coherent and the noisy file as you need for your source file]. [2] Starting the procedure Click the [Denoise All] or the [Denoise Selection] button at the bottom of the dialog box. The procedure starts and progress information is displayed. You can abort the procedure at any time by clicking the [Stop] button. Note, that it may take a little while before the algorithm stops, as event polling is kept at a minimum in order not to slow down the process. When finished, close the dialog box by clicking the [Done] button. You can now open and see/hear the resulting coherent and noise files. _______________________________________________________ About the D/Noise Algorithm and its Control Parameters by Maxim J. Goldberg and Igor Popovic _______________________________________________________ INTRODUCTION The D/Noise family of algorithms was developed for the purpose of removing noise from one dimensional signals, in particular, speech or music signals, by the method of denoising proposed by R. Coifman and V. Wickerhauser. One starts with a library of orthonormal waveforms, which typically includes wavelet packets and local trigonometric bases. A signal is expanded in each basis, and a cost assigned to the expansion. The basis giving rise to the least cost is chosen, the coefficients are ordered by magnitude, and a number of the leading terms is kept as the coherent part based on a predetermined threshold cost of the remaining terms. These leftover terms constitute by definition the noisy part of the signal, and can be treated as a new signal which can in turn be expanded and separated into its coherent and noisy components. In D/Noise, we use only one library of bases, those arising from the dyadic decomposition tree obtained by constructing local sines on the frequencies of a smoothly cut window from the signal. A "best" basis is chosen by comparing the cost of a parent node to the sum of the costs of the 2 children. In D/Noise, the cost function can be chosen to be Shannon entropy or the lp of the coefficients of an expansion. We attempt to deal with numerical artifacts arising from the processing by (1) allowing shifts in time and frequency, and (2) by segmenting into large windows and only using the uncorrupted middle core. The large window we are using is 4 times the size of the core. For example, if the user selects a signal window of 1,024 samples, internally we slide and denoise a window of 4,096 samples and use only its 1,024 wide core in the reconstruction. This strategy has proven to give more pleasing results than any other "fancy" windowing. PARAMETERS (1) Window size This parameter determines the number of consecutive samples processed at one time. Internally, the algorithm slides two "windows" of the selected width through the signal, offset by 1/2 their width. In addition, each window is extended to both its sides and only the core is used in the reconstruction after denoising. The windows should not be too narrow, since good frequency resolution is desirable, in particular for music. Nor should the windows be too wide, since information spread over time might mask local occurrences. For music, it seems that a choice of 512, of 1024, or perhaps 2048 are the sizes to consider first. (2) Log2 of reach This is the log-base-2 of the size of the smallest interval to be considered by the local trigonometric transform decomposition tree. For example, the preset of 4 will give you 2^4=16 samples as the smallest interval to be considered. (3) Energy threshold This is the energy threshold for discarding coefficients from the extracted signal basis. This number, typically .0001, or .000001, means that in the chosen basis, those coefficients of size less than (energy threshold) * (energy of window segment) are set to zero and thus discarded. (4) Entropy A real number alpha, to determine which entropy function will be used to separate out the noise component: alpha = 0.0 is Shannon entropy, 0 < alpha < 1 stands for little-l-sub-p norm, where p is 2*alpha. For example, entering 0.5 will result in l1 norm being used. (5) Entropy ratio This real number specifies the threshold mentioned in (3) above. A ratio of 1.0 or higher means that all the entries of expansion of the window segment will be considered to be coherent, while a ratio of 0.0 or less means that the entire signal coming from each window will be considered to be noise. For music, a good testing entropy ratio may be between 0.3 and 0.4 if using Shannon entropy (alpha = 0.0 in (4) above); 0.7 works well for alpha = 0.5. (6) Time shift A specific value k means the signal is padded with k zeros in front, the whole program is run, and then the output files are shifted back to the left by k samples. The purpose of different shifts in time is to have the signal window cuts to occur in different places. It is recommended that any shifts chosen be prime, or nearly prime numbers, without high powers of two occurring in their factorization, and each shift is less than one half of the window size set in (1) above. As mentioned previously, this version does not yet support any type of iteration. If you run the algorithm on the same file specifying a different time shift on each run, you will have to average the resulting files by hand, i.e. using some audio file mixing utility (a free utility package for AIFF files will be included with the next release). (7) Frequency shifts In this field you can enter up to 9 integer numbers, each specifying a shift in the frequency domain of the signal. As in (5), the purpose is to average out cutting artifacts from the spectrum when performing the adapted local trigonometric transform on the signal's frequencies. Small primes are recommended, the default presets should suffice.